Neural Net Project - Solutions

Let's wrap up this course by taking a a quick look at the effectiveness of Neural Nets!

We'll use the Bank Authentication Data Set from the UCI repository.

The data consists of 5 columns:

  • variance of Wavelet Transformed image (continuous)
  • skewness of Wavelet Transformed image (continuous)
  • curtosis of Wavelet Transformed image (continuous)
  • entropy of image (continuous)
  • class (integer)

Where class indicates whether or not a Bank Note was authentic.



Get the Data

Use read.csv to read the bank_note_data.csv file.

In [122]:
df <- read.csv('bank_note_data.csv')

Check the head of the data frame and its structure.

In [123]:
head(df)
Out[123]:
Image.VarImage.SkewImage.CurtEntropyClass
13.62168.6661-2.8073-0.446990
24.54598.1674-2.4586-1.46210
33.866-2.63831.92420.106450
43.45669.5228-4.0112-3.59440
50.32924-4.45524.5718-0.98880
64.36849.6718-3.9606-3.16250
In [125]:
str(df)
'data.frame':	1372 obs. of  5 variables:
 $ Image.Var : num  3.622 4.546 3.866 3.457 0.329 ...
 $ Image.Skew: num  8.67 8.17 -2.64 9.52 -4.46 ...
 $ Image.Curt: num  -2.81 -2.46 1.92 -4.01 4.57 ...
 $ Entropy   : num  -0.447 -1.462 0.106 -3.594 -0.989 ...
 $ Class     : int  0 0 0 0 0 0 0 0 0 0 ...

EDA

Create whatever visualizations you are interested in. We'll skip this step for the solutions notebook/video because the data isn't easily interpretable since its just statistical info on images.

Train Test Split

Use the caTools library to split the data into training and testing sets.

In [127]:
library(caTools)
set.seed(101)
split = sample.split(df$Class, SplitRatio = 0.70)

train = subset(df, split == TRUE)
test = subset(df, split == FALSE)

Check the structure of the train data and note that Class is still an int data type. We won't convert it to a factor for now because the neural net requires all numeric information.

In [129]:
str(train)
'data.frame':	960 obs. of  5 variables:
 $ Image.Var : num  3.622 4.546 3.457 0.329 4.368 ...
 $ Image.Skew: num  8.67 8.17 9.52 -4.46 9.67 ...
 $ Image.Curt: num  -2.81 -2.46 -4.01 4.57 -3.96 ...
 $ Entropy   : num  -0.447 -1.462 -3.594 -0.989 -3.163 ...
 $ Class     : int  0 0 0 0 0 0 0 0 0 0 ...

Building the Neural Net

Call the neuralnet library

In [130]:
library(neuralnet)

Browse through the documentation of neuralnet

In [140]:
#help(neuralnet)

Use the neuralnet function to train a neural net, set linear.output=FALSe and choose 10 hidden neurons (hidden=10)

In [141]:
nn <- neuralnet(Class ~ Image.Var + Image.Skew + Image.Curt + Entropy,data=train,hidden=10,linear.output=FALSE)

Predictions

Use compute() to grab predictions useing your nn model on the test set. Reference the lecture on how to do this.

In [142]:
predicted.nn.values <- compute(nn,test[,1:4])

Check the head of the predicted values. You should notice that they are still probabilities.

In [143]:
head(predicted.nn.values$net.result)
Out[143]:
30.002193454319
110.00003652277413
120.001859463551
130.00000632594144
140.00000656587257
170.0001286774258

Apply the round function to the predicted values so you only 0s and 1s as your predicted classes.

In [144]:
predictions <- sapply(predicted.nn.values$net.result,round)
In [145]:
head(predictions)
Out[145]:
  1. 0
  2. 0
  3. 0
  4. 0
  5. 0
  6. 0

Use table() to create a confusion matrix of your predictions versus the real values

In [146]:
table(predictions,test$Class)
Out[146]:
           
predictions   0   1
          0 229   0
          1   0 183

You should have noticed that you did very well! Almost suspiciously well! Let's check our results against a randomForest model!

Comparing Models

Call the randomForest library

In [147]:
library(randomForest)

Run the Code below to set the Class column of the data as a factor (randomForest needs it to be a factor, not an int like neural nets did. Then re-do the train/test split

In [148]:
df$Class <- factor(df$Class)
library(caTools)
set.seed(101)
split = sample.split(df$Class, SplitRatio = 0.70)

train = subset(df, split == TRUE)
test = subset(df, split == FALSE)

Create a randomForest model with the new adjusted training data.

In [149]:
model <- randomForest(Class ~ Image.Var + Image.Skew + Image.Curt + Entropy,data=train)

Use predict() to get the predicted values from your rf model.

In [150]:
rf.pred <- predict(model,test)

Use table() to create the confusion matrix.

In [151]:
table(rf.pred,test$Class)
Out[151]:
       
rf.pred   0   1
      0 227   1
      1   2 182

How did the models compare?

Great Job!